ESTprep: Preprocessing CDNA Sequence Reads
نویسندگان
چکیده
MOTIVATION High accuracy of data always governs the large-scale gene discovery projects. The data should not only be trustworthy but should be correctly annotated for various features it contains. Sequence errors are inherent in single-pass sequences such as ESTs obtained from automated sequencing. These errors further complicate the automated identification of EST-related sequencing. A tool is required to prepare the data prior to advanced annotation processing and submission to public databases. RESULTS This paper describes ESTprep, a program designed to preprocess expressed sequence tag (EST) sequences. It identifies the location of features present in ESTs and allows the sequence to pass only if it meets various quality criteria. Use of ESTprep has resulted in substantial improvement in accurate EST feature identification and fidelity of results submitted to GenBank. AVAILABILITY The program is freely available for download from http://genome.uiowa.edu/pubsoft/software.html
منابع مشابه
Vicinal: a method for the determination of ncRNA ends using chimeric reads from RNA-seq experiments
Non-coding (nc)RNAs are important structural and regulatory molecules. Accurate determination of the primary sequence and secondary structure of ncRNAs is important for understanding their functions. During cDNA synthesis, RNA 3' end stem-loops can self-prime reverse transcription, creating RNA-cDNA chimeras. We found that chimeric RNA-cDNA fragments can also be detected at 5' end stem-loops, a...
متن کاملMinimap2: fast pairwise alignment for long DNA sequences
Motivation: Recent advances in sequencing technologies promise ultra-long reads of ∼100 kilo bases (kb) in average, full-length mRNA or cDNA reads in high throughput and genomic contigs over 100 mega bases (Mb) in length. Existing alignment programs are unable or inefficient to process such data at scale, which presses for the development of new alignment algorithms. Results: Minimap2 is a gene...
متن کاملLPS: a strategy for the generation of longer DNA sequence fragments from short reads
The output of some modern genome sequencing techniques consists of short length DNA fragments known as reads. A disadvantage of short length reads is that they may appear at different positions of the original genome sequence. Not recognizing the position of repetitive fragments may generate gaps in the final assembled sequence. This happens because repeated fragments would not be considered as...
متن کاملCost-Effective Sequencing of Full-Length cDNA Clones Powered by a De Novo-Reference Hybrid Assembly
BACKGROUND Sequencing full-length cDNA clones is important to determine gene structures including alternative splice forms, and provides valuable resources for experimental analyses to reveal the biological functions of coded proteins. However, previous approaches for sequencing cDNA clones were expensive or time-consuming, and therefore, a fast and efficient sequencing approach was demanded. ...
متن کاملZseq: An Approach for Preprocessing Next-Generation Sequencing Data
Next-generation sequencing technology generates a huge number of reads (short sequences), which contain a vast amount of genomic data. The sequencing process, however, comes with artifacts. Preprocessing of sequences is mandatory for further downstream analysis. We present Zseq, a linear method that identifies the most informative genomic sequences and reduces the number of biased sequences, se...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Bioinformatics
دوره 19 11 شماره
صفحات -
تاریخ انتشار 2003